Stat 541 Final Portfolio

Independent Learning (IL):

These objectives show your ability to seek out new information and adapt to new tools to solve data analysis problems.

[IL-1] Adding new skills:

I can find and adopt new packages to accomplish tasks.
I can adapt to different syntax styles (tidy, base, formula style, data.table).
I can use tutorials, etc. to enhance my understanding of new concepts

Level: 4

Justification:

Ok, so this may be the first learning objective, however it is the last one I am writing. Why? Well, in part , because I keep scrolling past it :). Lets see, I do not believe we learning how to use kable() in this class, and I implemented this wonderful table formatting library in the cheese lab! In my function for my groups package, I used formula style outside of functions like lm() for the first time. One line 44 you can see me using as.formula() to convert a string to a formula, then right after I have update() if the intercept needs to be removed. Further down on line 85 I skip the middle man and use the data vectors to create a formula..

[IL-2] Online resources:

I can use online resources (Google, ChatGPT, StackOverflow) to solve problems, debug, or find new tools.
I can find source code for similar projects to use as starting points for my own
I can read the documentation of an API to figure out how to access data.

Level: 3

Justification:

For the Week 3 Shiny Project, I found the base code for our server, ui, and global files on RStudios gallery of Shiny apps. Furthermore, I relied on ChatGPT to support some of my changes to modifying the base code. For example, I noticed that my word cloud plot cut off words when the size or amount of words exceeded some limit. I queried “my plot gets cut off when too many words are allowed in my word cloud” after providing my code, and ChatGPT provided three solutions. I recognized that one of them was impossible, and after testing the remaining two found that only one of them solved my issue, but solve it did!

Reproducible Workflow (RW):

These objectives show your ability to produce artifacts and deliverables that are organized, documented, version tracked, and responsibly designed. _________________

[RW-1] File, code, and data management:

I can use Git and GitHub to track my progress and collaborate (creating repos, cloning, forking, pull requesting).
I always use R Projects and the {here} package to organize my scripts, notebooks, data, and applications.

Level: 4

Justification:

Look no further than this portfolios github page itself! Here you can see evidence of creating my own repo, then forking and pull requesting in a neat, orderly, manner. A beautiful example of cloning occurred during the Shiny Project, which is wonderfully organized into a “data” folder and a “app” folder housing our R scripts (named fake_news). Within this folder itself you can find examples of the {here} package being used within the global notebook, aiming at the “data” folder and then the necessary data set file.

[RW-2] Notebooks:

I can use Quarto and/or R Markdown to produce a reproducible notebook and polished rendered document.
I can use appropriate chunk options (echo, error, cache, etc.) to render my qmd/Rmd quickly and cleanly.

Level: 4

Justification:

The practice lab is the all-in-one assignment with a variety of different chunk options decorating the Quarto file. From echo, to error, to message, it has it all! Although it has wonderful labels as well, lets look to lab 1 for examples of this. Specifically, part 2 does a perfect job with the label chunk option naming each chunk appropriately (it even has warning and message options added as needed). Additionally, we can see the rendered output for part 2, and part 1 if you want it, presenting the fruits of our labor (minus the chunk labels, those are hidden away). Jumping over to this portfolio, I have just added the option for a light or dark mode! The light mode uses the ‘flatly’ theme while the dark mode uses ‘darkly’. These can be swapped on the rendered html file in the upper right corner when at the top of the file.

[RW-3] Code style

My code is clear, readable, well-organized, and well-commented.
I can use a package-based workflow to organize my analyses

Level: 4

Justification:

My code, especially my functions, include comments (almost) line by line, as well as an overall comment describing the inputs and outputs of the function as a whole. A fresh example (at the time of writing this) is from lab 4 (where did lab 3 go? who knows!). Within this pull request, lines 65 to 155 contain parts of a helper function I wrote. The aforementioned pull request shows my addition of comments so my group mates, and others, can understand the logic of my function. Later on in a separate pull request I added an overall header to said function! Looking the very first chunk in any of my Quarto notebooks, you will see a chunk labeled “set-up” or “packages” indicating a chunk of only R packages being instantiated. Going back to lab 4, the amount of packages is quiet substantial, and includes the chunk option “message: false” to hide any output from said chunk.

Technical Communication (TC):

These objectives show your ability to communicate the processes you have implemented in your code, as well as the data conclusions and results.

[TC-1] Project summaries:

I can clearly and succinctly summarize the contributions of my project.
I accurately interpret statistical or modeling results.
I consider the appropriate scope and impact of my project results.

Level: 3

Justification:

~~The only example I can think of for this (so far) is~~ The overview document submitted with the Shiny Project. In this document, we summarize our data, data cleaning process, our files, and finally our app itself. Within the summary of the app, I summarized my contributions (which in turn were my contributions to this project)! Jumping forwards to the generative art lab, part of this lab was describing the code and how difference arguments affect the art. This is discussed more later, but, well, I do just that. An now jumping back a bit, while creating my function for my groups package I considered who would be using it. That is, introductary statistics students (ideally). Thus, I kept my function (Simple Linear Regression) inputs simple as well as the provided outputs.

[TC-2] Documentation:

I provide ample documentation and tutorials for my custom functions.
I provide user-friendly guides for my tools and software

Level: 4

Justification:

For Lab 1, I was the sole contributor to Part 1, the Q-Q Plot custom function. This involved creating a custom Q-Q Plotting function and testing it with the cars dataset. My function not only includes comments detailing what each of the following line(s) of code do, but additionally has an overview comment. The overview comment lists what data types each input is required to be along with what the output of the function is. For another example, see my get_passes function from lab 4 (as mentioned in RW-3).

Data Manipulation (DM):

These objectives relate to the collection, cleaning, processing, and preparing of datasets for analysis.

[DM-1] Data preparation:

I can read in datasets to R, including untidy ones.
I can clean datasets to deal with missing data, typos, poor formatting, etc.

Level: 4

Justification:

In lab 2 I read in a data set containing country name and number so we could label our observations in our improved graphics. This data was an excel file with multiple sheets, so I had to pick the sheet and range of cells to read in, and label the column name since there were no column headers. Since all the countries and their respective number were all in one string (i.e. “1=US, 2=EU,…”), I needed to clean the column by splitting it into one observation per country (split on ,), then split the number and country into there own columns (split on =). Finally I filtered out a messy observation that appeared from this process and selected the two columns I needed. For examples of reading in data sets I could reference practically any lab we did. Then again, in many labs we did either scrape the data, use an API, or generate the data ourselves… Anyways, as mentioned elsewhere the beginning of lab 3 was done by Sean with Ryan and I helping from the sidelines. This includes the data read in portion, where we used the here() library, read in data with different delimiters (aka not csv), and renamed/mutated difference columns.

[DM-2] Data wrangling

I can cleverly use pivoting, grouping, and joining to wrangle data.
I can use mapping ({purrr}), applying (tapply, lapply, …), and/or iteration (for loops) to perform repeated tasks.

Level: 4

Justification:

Once again we return to lab 4! The first chunk (lines 23 to 56), after package load in, contains the code for reading, wrangling, and joining the two data sets used. We used select, mutate, and rename to wrangle the data (neither of which were in a csv!), then an inner join to combine the two data sets (this could have also been done in a left or right join, however an inner join keeps only observations that appear in both data sets). The next chunk (lines 179 to 198) has an example of the map function in action as well as a pivot wider! Fun aside, for mastery so I guess not so much an aside, in this commit you can see how I replaced the superseded pmap_dfr function, which I originally used, with the map function.

[DM-3] Data formats

I can use API urls to access JSON data and convert it to a data frame

Level: 4

Justification:

So far we have only handled API uses twice, once in a check in and once in lab 4. The final question for “Check-Ins: New Data Sources” required us to connect and use the same API as lab 4. While I do not have the code I used anymore, since it was in the console, I have the proof of my correct response to show that I correctly called the API and converted the JSON object. For lab 4, my group did this section collaboratively as a group (lines 92 to 133). This code ended up being a part of the function I wrote, crafting an API call for each state, checking the success of the API call, then transforming the JSON to a data frame. Additionally, as the cherry on top, I have added Sys.sleep() directly before the API call to ensure we do not overload the API!

[DM-4] Data collection

I can webscrape simple tables and information

Level: 4

Justification:

Oh you bet I can webscrape a table, lets jump over to, what I have dubbed, the cheese lab. My main contribution to this lab was Part 5: gathering specific information on select cheeses. This entailed taking the url of a cheese found in Part 4, then finding the desired information on the page. What gets returned is a lovely data frame, with one column per feature of interest. Now then, not only does my wonderful scraping happen in a function, perfect for a map(), it takes advantage of a helper function to navigate to the desired features using function from rvest(). As a fun feature, I use slice_sample() to randomly select 10 cheeses, so its a surprise every time (I love randomness!), and allow kable() to make more visually appealing takes of my scraping.

Professional Visualization (PV):

[PV-1] ggplot: grammar of graphics

I can use less common geometries, including those from ggplot extension packages.
I can use the correct aesthetics to map variables
I understand how geometries inherit aesthetics
I can add annotations to my plot

Level: 4

Justification:

The perfect example of this learning objective can be found in lab 2! My contributions were on visualization 2, where I used {ggplot} and {ggrafe} to improve a scatter plot. In my graph code (lines 150 to 192) we can see how I defined the aesthetics for my graph in the ggplot() function at the beginning, then took advantage of this in geom_smooth(). A last minute addition, talked about more in the generative art section, is my use of geometries in the generative art lab! In the first section of this lab I favored geom_violin() and geom_hex() to create my visualization.

[PV-2] ggplot: theme

I can edit the titles, subtitles, captions, axis labels, etc. to create a clearly labelled plot
I can choose colors (“scales”) and themes to make a visually pleasing and accessible plot

Level: 4

Justification:

Let us return to lab 2 from the previous learning objective. In visualization 2, we can see the wonderful title and subtitle I gave my plot, with the removal of the axis labels. In the theme() function I modified the text type and size of these aspects, while in the scale_x/y_continuous() functions I added percentage signs (because my axis variables are percents!). I have a few additional changes in my theme() function editing the major and minor grid lines to make the graph more pleasing to view while keeping the usefulness of a line to compare points to.

[PV-3] Dynamic visualizations

I can use a package like {gganimate} to create self-contained gifs
I can use a package like {plotly}, {ggplotly}, {leaflet}, {ggirafe}, etc. to make interactive html widgets

Level: 3

Justification:

For Lab 2, although I created the 2nd Visualization, I was the first to create my graph. For my team I pioneered the use of {ggirafe}, picking out how to create pop-up labels when hovering over data points and format them nicely. Looking at Katherines contributes later, you can see the same general formatting for her use of {ggirafe}. Additionally, I improved upon the leaflet Kai created, adding labels on hover and setting the initial View. I never went back to use {gganimate}, but if you want I could reference other classes where I used it ;)

[PV-4] Shiny

I can create a functional Shiny app.
I understand the principle of reactivity, and how to use it.

Level: 4

Justification:

Well this is just asking to link all of the Shiny Project, so I will! I layed the foundations for my groups shiny app as seen in the following pull request. Although much of this code was from the the RStudios gallery (as noted in IL-2), I made substantial edits scattered throughout my following pull requests to the ui, server, and global files. Regarding reactivity, my word cloud plot waits for the user to select a category to filter on then press a button before updating. This functionality was expanded to the two additional plots my group mates created. We ran into an error with there plots where the titles would update prior to the “change” button being pressed. To solve this issue, we used eventReactive to store the title so it would wait until the button was pressed.

Software Development (SD):

These objectives relate to your ability to develop correct, usable, well-designed, and sophisticated software in the R language.

[SD-1] R programming language details

I understand non-standard evaluation (aka “Tidy Eval” or “unquoted objects”), and I can use tunneling in my functions.
I understand functional programming, and I can use functions as objects in my code design.

Level: 4

Justification:

I had never used tunneling prior to creating my function for my groups R package, however I now love tunneling since it allows me to pass in the name of a column without putting it in quotation marks! Scattered throughout my function and helper function you can see examples of my usage of tunneling, for the best example I would like to draw your attention to the create_scatter_plot() function, specifically the ggplot() code within it. Additionally this code has two helper function: create_scatter_plot() and test_assumptions(). The first is used to, well, create a scatter plot in my main function while the latter is used to test model assumptions, returning the necessary plots.

[SD-2] Package creation:

I can create a folder that is installable as an R package, possibly using {usethis} helper functions
I can document my functions using {roxygen2} style commenting
I can write and run unit tests using {testthat}
I can design a package that is user-friendly and has well-designed functions.

Level: 4

Justification:

For my groups package project, I was the person tasked with creating the initial set-up. To do so, I used the {usethis} package. The function I wrote (slr()), as well as its helper functions, all use {roxygen2} style commenting: describing what the function does, its input parameters, and return object, as well as what helper functions are imported from outside packages. I created test cases using {testthat} for my function and helper functions, taking care to ensure the full scope of my return results are tested (especially for my test_assumptions() helper function). Lastly, I went through all of my group mates functions at the end to standardize our functions and their parameters naming conventions to create a more user-friendly package.

Algorithms and Iteration (AI):

These objectives ask you to design code-based approaches to statistical computing problems, usually involving iteration to a stopping condition.

[AI-1] Creating an algorithm:

I can invent and implement my own iterative algorithm.
I have built in checks for possible problems or extreme cases in the algorithm.

Level: 4

Justification:

My function to call the API in lab 4 (lines 61 to 134) ticks all of the boxes for this learning objective. It beings with stopping conditions for each parameter to ensure the correct data type is being inputted (added in a later pull request). Then, depending on the output from the API all the function might increase the hours value and recall the API (increasing retry_count by 1), retry the API if an error occurred (also increasing retry_count by 1), or having collected the correct information return the output. This occurs in a while loop, and can continue until retry_count equals the input parameter max_retries, the hours parameter equals 72, or the correct information is obtained and returned. Additionally, since 72 hours is the max hours the API can handle, I have a case_when() function checking to make sure the hours does not add over 72, and that once 72 hours is checked the function terminates with a message describing why.

[AI-2] Generative art:

I can apply a variety of generative art functions to make a visually pleasing piece.
I can explain why particular changes to the code result in particular differences in the visualization.

Level: 4

Justification:

Hmmmm, I wonder what lab I should use for this… I know, the cheese la- I mean generative art lab! For my first piece of fabulous art, I used ggplot() to create an incredibly visually pleasing egg. Meanwhile for my second piece, I used a couple functions (one which does use ggplot()) to create my favorite of the two pieces. Both art pieces have descriptions underneath detailing how different input parameters would change the art (along with an interesting description of the art itself).

Code Design (CD):

These objectives relate to making wise or clever choices in how you implement a procedure in code; including creating functions and objects, or thinking about the clarity and efficiency of processes.

[CD-1] Speed and Efficiency:

I can recognize moments of possible slowdown in my code, and use built-in functions or parallelizing to speed them up.
I always use and design vectorized functions whenever possible.

Level: 4

Justification:

Oh, do I have the commit message for you: “vectorized for loop.” In class Sean was doing the coding while Ryan and I watched and though up what needed to be done. After seeing this for loop, although beautiful in its own way, I felt the urge to improve it. This commit shows the removal of a slow process, and the introduction of a vectorized process via pmap_dfr(). This was eventually switched to a different process once I learned the function was superseeded, but that is explained in a different learning objective. If you are looking for an example where I wrote the vectorized function myself, then look no further then the very beginning of the class! No, really, look at lab 1. Since it has been a while since I talked about lab 1 (at least in the order of writing these justification, who knows when you last read about it), I was in charge of part 1, the Q-Q Plot Function. As noted in my description of the function, the input parameter x is gasp a vector. Furthermore within the function I use vectorized functions such as rnorm() and sort().

[CD-2] Object handling:

I can make reasonable choices in my code design about when to save intermediate objects.
I can convert objects between types and structures as needed.

Level: 4

Justification:

To start, I did not write the code myself but was present and helping when the code was written. As I have previously explained, my group started the API lab (lab 3) during class with only Sean coding, and here we converted the return from the API (a json) to a data frame. Another example of converting my data is from the cheese lab. Here, I take a list structure, using unlist() to convert it to a vector, then a combination of matrix() and data.frame() to convert the vector to a data frame. I believe one of my best built functions is my function from lab 1 part 1. Throughout the function I save my rnorm() values as well as my sorted values in there own variables. This allows my ggplot() code to be more readable since we do not see rnorm() or sort() in the arguments. My only critique here would be I do not need to save the ggplot() output then do a return statement since R would have returned the plot automatically.

[CD-3] Supporting functions:

I write helper/shortcut functions to streamline repeated tasks and make my code easier to read.
I use intermediate functions to streamline repeated or looping processes.

Level: 4

Justification:

I am writing this after I write the below learning objective (this learning objective was moved, so ah-ha!), so back to lab 4! My function I wrote here is actually a helper function for looping through each country. By creating this helper function, my code is much cleaner than it would be otherwise. Additionally, my cheese lab code uses a helper function. My part, again, was to get specific details on a subset of cheeses. Rather then go and copy paste my retrieval code a bunch of times, I created a function!

Summary

Grade

Based on the summary plot above, I believe I have earned a A in STAT 541.

Justification:

Personally, I think the graph speaks for itself (both figuratively and literally), but lets see what I can add. I started with a strong base for this class taking STAT 150 with Dr. Bodwin and STAT 331 with you! This class not only reinforced what I learned in these classes, but build upon my skills. I learned how to use Github to save my work step by step, as well as how to take advantage Quarto. I take pride in being able to write clean, efficient, functions and craft useful visualizations, both of which are on full display in each and every lab (although not necessarily both at the same time). I took my first steps into creating Shiny Apps and Packages during this class. While the documentation of my groups package could use a bit of sprucing up, the other aspects are all up to par. Overall, I believe that I have a strong grasp on the material taught throughout the quarter, whether I learned it outside this class or during this class.